-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[LLVM][IR] Add location tracking to LLVM IR parser #155797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Co-authored-by: Nikita Popov <[email protected]>
✅ With the latest revision this PR passed the C/C++ code formatter. |
@llvm/pr-subscribers-llvm-ir Author: Bertik23 (Bertik23) ChangesThis PR is part of the LLVM IR LSP server project (RFC) To be able to make a LSP server, it's crutial to have location information about the LLVM objects (Functions, BasicBlocks and Instructions). This PR adds:
The AsmParserContext can be passed as an optional parameter into the parser. Which populates it and it can be then used by other tools, such as the LSP server. The AsmParserContext idea was borrowed from MLIR. As we didn't want to store data no one else uses inside the objects themselves. But the implementation is different, this class holds several maps of Functions, BasicBlocks and Instructions, to map them to their location. The Lexer had all of it's manual pointer increments changed to getNextChar() which checks for linebreaks and updates the line and column counters accordingly. Patch is 39.53 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/155797.diff 13 Files Affected:
diff --git a/llvm/include/llvm/AsmParser/AsmParserContext.h b/llvm/include/llvm/AsmParser/AsmParserContext.h
new file mode 100644
index 0000000000000..78ea32ac7ca08
--- /dev/null
+++ b/llvm/include/llvm/AsmParser/AsmParserContext.h
@@ -0,0 +1,53 @@
+//===-- AsmParserContext.h --------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_ASMPARSER_ASMPARSER_STATE_H
+#define LLVM_ASMPARSER_ASMPARSER_STATE_H
+
+#include "llvm/ADT/DenseMap.h"
+#include "llvm/IR/Value.h"
+#include <optional>
+
+namespace llvm {
+
+/// Registry of file location information for LLVM IR constructs
+///
+/// This class provides access to the file location information
+/// for various LLVM IR constructs. Currently, it supports Function,
+/// BasicBlock and Instruction locations.
+///
+/// When available, it can answer queries about what is at a given
+/// file location, as well as where in a file a given IR construct
+/// is.
+///
+/// This information is optionally emitted by the LLParser while
+/// it reads LLVM textual IR.
+class AsmParserContext {
+public:
+ std::optional<FileLocRange> getFunctionLocation(const Function *) const;
+ std::optional<FileLocRange> getBlockLocation(const BasicBlock *) const;
+ std::optional<FileLocRange> getInstructionLocation(const Instruction *) const;
+ std::optional<Function *> getFunctionAtLocation(const FileLocRange &) const;
+ std::optional<Function *> getFunctionAtLocation(const FileLoc &) const;
+ std::optional<BasicBlock *> getBlockAtLocation(const FileLocRange &) const;
+ std::optional<BasicBlock *> getBlockAtLocation(const FileLoc &) const;
+ std::optional<Instruction *>
+ getInstructionAtLocation(const FileLocRange &) const;
+ std::optional<Instruction *> getInstructionAtLocation(const FileLoc &) const;
+ bool addFunctionLocation(Function *, const FileLocRange &);
+ bool addBlockLocation(BasicBlock *, const FileLocRange &);
+ bool addInstructionLocation(Instruction *, const FileLocRange &);
+
+private:
+ DenseMap<Function *, FileLocRange> Functions;
+ DenseMap<BasicBlock *, FileLocRange> Blocks;
+ DenseMap<Instruction *, FileLocRange> Instructions;
+};
+} // namespace llvm
+
+#endif
diff --git a/llvm/include/llvm/AsmParser/LLLexer.h b/llvm/include/llvm/AsmParser/LLLexer.h
index 501a7aefccd7f..5008ef029f3ff 100644
--- a/llvm/include/llvm/AsmParser/LLLexer.h
+++ b/llvm/include/llvm/AsmParser/LLLexer.h
@@ -29,6 +29,20 @@ namespace llvm {
const char *CurPtr;
StringRef CurBuf;
+ // The line number at `CurPtr-1`, zero-indexed
+ unsigned CurLineNum = 0;
+ // The column number at `CurPtr-1`, zero-indexed
+ unsigned CurColNum = -1;
+ // The line number of the start of the current token, zero-indexed
+ unsigned CurTokLineNum = 0;
+ // The column number of the start of the current token, zero-indexed
+ unsigned CurTokColNum = 0;
+ // The line number of the end of the current token, zero-indexed
+ unsigned PrevTokEndLineNum = -1;
+ // The column number of the end (exclusive) of the current token,
+ // zero-indexed
+ unsigned PrevTokEndColNum = -1;
+
enum class ErrorPriority {
None, // No error message present.
Parser, // Errors issued by parser.
@@ -62,9 +76,7 @@ namespace llvm {
explicit LLLexer(StringRef StartBuf, SourceMgr &SM, SMDiagnostic &,
LLVMContext &C);
- lltok::Kind Lex() {
- return CurKind = LexToken();
- }
+ lltok::Kind Lex() { return CurKind = LexToken(); }
typedef SMLoc LocTy;
LocTy getLoc() const { return SMLoc::getFromPointer(TokStart); }
@@ -79,6 +91,21 @@ namespace llvm {
IgnoreColonInIdentifiers = val;
}
+ // Get the current line number, zero-indexed
+ unsigned getLineNum() { return CurLineNum; }
+ // Get the current column number, zero-indexed
+ unsigned getColNum() { return CurColNum; }
+ // Get the line number of the start of the current token, zero-indexed
+ unsigned getTokLineNum() { return CurTokLineNum; }
+ // Get the column number of the start of the current token, zero-indexed
+ unsigned getTokColNum() { return CurTokColNum; }
+ // Get the line number of the end of the previous token, zero-indexed,
+ // exclusive
+ unsigned getPrevTokEndLineNum() { return PrevTokEndLineNum; }
+ // Get the column number of the end of the previous token, zero-indexed,
+ // exclusive
+ unsigned getPrevTokEndColNum() { return PrevTokEndColNum; }
+
// This returns true as a convenience for the parser functions that return
// true on error.
bool ParseError(LocTy ErrorLoc, const Twine &Msg) {
@@ -93,7 +120,12 @@ namespace llvm {
private:
lltok::Kind LexToken();
+ // Return closest pointer after `Ptr` that is an end of a label.
+ // Returns nullptr if `Ptr` doesn't point into a label.
+ const char *getLabelTail(const char *Ptr);
int getNextChar();
+ const char *skipNChars(unsigned N);
+ void advancePositionTo(const char *Ptr);
void SkipLineComment();
bool SkipCComment();
lltok::Kind ReadString(lltok::Kind kind);
diff --git a/llvm/include/llvm/AsmParser/LLParser.h b/llvm/include/llvm/AsmParser/LLParser.h
index c01de4a289a69..02460e5e52203 100644
--- a/llvm/include/llvm/AsmParser/LLParser.h
+++ b/llvm/include/llvm/AsmParser/LLParser.h
@@ -13,6 +13,7 @@
#ifndef LLVM_ASMPARSER_LLPARSER_H
#define LLVM_ASMPARSER_LLPARSER_H
+#include "AsmParserContext.h"
#include "LLLexer.h"
#include "llvm/ADT/StringMap.h"
#include "llvm/AsmParser/NumberedValues.h"
@@ -177,6 +178,9 @@ namespace llvm {
// Map of module ID to path.
std::map<unsigned, StringRef> ModuleIdMap;
+ /// Keeps track of source locations for Values, BasicBlocks, and Functions
+ AsmParserContext *ParserContext;
+
/// Only the llvm-as tool may set this to false to bypass
/// UpgradeDebuginfo so it can generate broken bitcode.
bool UpgradeDebugInfo;
@@ -189,10 +193,11 @@ namespace llvm {
public:
LLParser(StringRef F, SourceMgr &SM, SMDiagnostic &Err, Module *M,
ModuleSummaryIndex *Index, LLVMContext &Context,
- SlotMapping *Slots = nullptr)
+ SlotMapping *Slots = nullptr,
+ AsmParserContext *ParserContext = nullptr)
: Context(Context), OPLex(F, SM, Err, Context),
Lex(F, SM, Err, Context), M(M), Index(Index), Slots(Slots),
- BlockAddressPFS(nullptr) {}
+ BlockAddressPFS(nullptr), ParserContext(ParserContext) {}
bool Run(
bool UpgradeDebugInfo,
DataLayoutCallbackTy DataLayoutCallback = [](StringRef, StringRef) {
diff --git a/llvm/include/llvm/AsmParser/Parser.h b/llvm/include/llvm/AsmParser/Parser.h
index c900b79665404..22b0881d92b53 100644
--- a/llvm/include/llvm/AsmParser/Parser.h
+++ b/llvm/include/llvm/AsmParser/Parser.h
@@ -15,6 +15,7 @@
#include "llvm/ADT/STLFunctionalExtras.h"
#include "llvm/ADT/StringRef.h"
+#include "llvm/AsmParser/AsmParserContext.h"
#include "llvm/Support/Compiler.h"
#include <memory>
#include <optional>
@@ -62,7 +63,8 @@ parseAssemblyFile(StringRef Filename, SMDiagnostic &Err, LLVMContext &Context,
/// parsing.
LLVM_ABI std::unique_ptr<Module>
parseAssemblyString(StringRef AsmString, SMDiagnostic &Err,
- LLVMContext &Context, SlotMapping *Slots = nullptr);
+ LLVMContext &Context, SlotMapping *Slots = nullptr,
+ AsmParserContext *ParserContext = nullptr);
/// Holds the Module and ModuleSummaryIndex returned by the interfaces
/// that parse both.
@@ -128,9 +130,9 @@ parseSummaryIndexAssemblyString(StringRef AsmString, SMDiagnostic &Err);
LLVM_ABI std::unique_ptr<Module> parseAssembly(
MemoryBufferRef F, SMDiagnostic &Err, LLVMContext &Context,
SlotMapping *Slots = nullptr,
- DataLayoutCallbackTy DataLayoutCallback = [](StringRef, StringRef) {
- return std::nullopt;
- });
+ DataLayoutCallbackTy DataLayoutCallback =
+ [](StringRef, StringRef) { return std::nullopt; },
+ AsmParserContext *ParserContext = nullptr);
/// Parse LLVM Assembly including the summary index from a MemoryBuffer.
///
@@ -169,9 +171,9 @@ parseSummaryIndexAssembly(MemoryBufferRef F, SMDiagnostic &Err);
LLVM_ABI bool parseAssemblyInto(
MemoryBufferRef F, Module *M, ModuleSummaryIndex *Index, SMDiagnostic &Err,
SlotMapping *Slots = nullptr,
- DataLayoutCallbackTy DataLayoutCallback = [](StringRef, StringRef) {
- return std::nullopt;
- });
+ DataLayoutCallbackTy DataLayoutCallback =
+ [](StringRef, StringRef) { return std::nullopt; },
+ AsmParserContext *ParserContext = nullptr);
/// Parse a type and a constant value in the given string.
///
diff --git a/llvm/include/llvm/IR/Value.h b/llvm/include/llvm/IR/Value.h
index 04d0391c04098..9e27797d151e2 100644
--- a/llvm/include/llvm/IR/Value.h
+++ b/llvm/include/llvm/IR/Value.h
@@ -55,6 +55,38 @@ class User;
using ValueName = StringMapEntry<Value *>;
+struct FileLoc {
+ unsigned Line;
+ unsigned Col;
+
+ bool operator<=(const FileLoc &RHS) const {
+ return Line < RHS.Line || (Line == RHS.Line && Col <= RHS.Col);
+ }
+
+ bool operator<(const FileLoc &RHS) const {
+ return Line < RHS.Line || (Line == RHS.Line && Col < RHS.Col);
+ }
+
+ FileLoc(unsigned L, unsigned C) : Line(L), Col(C) {}
+};
+
+struct FileLocRange {
+ FileLoc Start;
+ FileLoc End;
+
+ FileLocRange() : Start(0, 0), End(0, 0) {}
+
+ FileLocRange(FileLoc S, FileLoc E) : Start(S), End(E) {
+ assert(Start <= End);
+ }
+
+ bool contains(FileLoc L) const { return Start <= L && L <= End; }
+
+ bool contains(FileLocRange LR) const {
+ return contains(LR.Start) && contains(LR.End);
+ }
+};
+
//===----------------------------------------------------------------------===//
// Value Class
//===----------------------------------------------------------------------===//
diff --git a/llvm/include/llvm/IRReader/IRReader.h b/llvm/include/llvm/IRReader/IRReader.h
index 790140f19934e..00cf12d342ae0 100644
--- a/llvm/include/llvm/IRReader/IRReader.h
+++ b/llvm/include/llvm/IRReader/IRReader.h
@@ -15,6 +15,7 @@
#define LLVM_IRREADER_IRREADER_H
#include "llvm/ADT/StringRef.h"
+#include "llvm/AsmParser/AsmParserContext.h"
#include "llvm/Bitcode/BitcodeReader.h"
#include "llvm/Support/Compiler.h"
#include <memory>
@@ -50,19 +51,19 @@ getLazyIRFileModule(StringRef Filename, SMDiagnostic &Err, LLVMContext &Context,
/// for it. Otherwise, attempt to parse it as LLVM Assembly and return
/// a Module for it.
/// \param DataLayoutCallback Override datalayout in the llvm assembly.
-LLVM_ABI std::unique_ptr<Module> parseIR(MemoryBufferRef Buffer,
- SMDiagnostic &Err,
- LLVMContext &Context,
- ParserCallbacks Callbacks = {});
+LLVM_ABI std::unique_ptr<Module>
+parseIR(MemoryBufferRef Buffer, SMDiagnostic &Err, LLVMContext &Context,
+ ParserCallbacks Callbacks = {},
+ AsmParserContext *ParserContext = nullptr);
/// If the given file holds a bitcode image, return a Module for it.
/// Otherwise, attempt to parse it as LLVM Assembly and return a Module
/// for it.
/// \param DataLayoutCallback Override datalayout in the llvm assembly.
-LLVM_ABI std::unique_ptr<Module> parseIRFile(StringRef Filename,
- SMDiagnostic &Err,
- LLVMContext &Context,
- ParserCallbacks Callbacks = {});
+LLVM_ABI std::unique_ptr<Module>
+parseIRFile(StringRef Filename, SMDiagnostic &Err, LLVMContext &Context,
+ ParserCallbacks Callbacks = {},
+ AsmParserContext *ParserContext = nullptr);
}
#endif
diff --git a/llvm/lib/AsmParser/AsmParserContext.cpp b/llvm/lib/AsmParser/AsmParserContext.cpp
new file mode 100644
index 0000000000000..f5e3d83f5d346
--- /dev/null
+++ b/llvm/lib/AsmParser/AsmParserContext.cpp
@@ -0,0 +1,91 @@
+//===-- AsmParserContext.cpp ------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "llvm/AsmParser/AsmParserContext.h"
+
+namespace llvm {
+
+std::optional<FileLocRange>
+AsmParserContext::getFunctionLocation(const Function *F) const {
+ if (!Functions.contains(F))
+ return std::nullopt;
+ return Functions.at(F);
+}
+
+std::optional<FileLocRange>
+AsmParserContext::getBlockLocation(const BasicBlock *BB) const {
+ if (!Blocks.contains(BB))
+ return std::nullopt;
+ return Blocks.at(BB);
+}
+
+std::optional<FileLocRange>
+AsmParserContext::getInstructionLocation(const Instruction *I) const {
+ if (!Instructions.contains(I))
+ return std::nullopt;
+ return Instructions.at(I);
+}
+
+std::optional<Function *>
+AsmParserContext::getFunctionAtLocation(const FileLocRange &Query) const {
+ for (auto &[F, Loc] : Functions) {
+ if (Loc.contains(Query))
+ return F;
+ }
+ return std::nullopt;
+}
+
+std::optional<Function *>
+AsmParserContext::getFunctionAtLocation(const FileLoc &Query) const {
+ return getFunctionAtLocation(FileLocRange(Query, Query));
+}
+
+std::optional<BasicBlock *>
+AsmParserContext::getBlockAtLocation(const FileLocRange &Query) const {
+ for (auto &[BB, Loc] : Blocks) {
+ if (Loc.contains(Query))
+ return BB;
+ }
+ return std::nullopt;
+}
+
+std::optional<BasicBlock *>
+AsmParserContext::getBlockAtLocation(const FileLoc &Query) const {
+ return getBlockAtLocation(FileLocRange(Query, Query));
+}
+
+std::optional<Instruction *>
+AsmParserContext::getInstructionAtLocation(const FileLocRange &Query) const {
+ for (auto &[I, Loc] : Instructions) {
+ if (Loc.contains(Query))
+ return I;
+ }
+ return std::nullopt;
+}
+
+std::optional<Instruction *>
+AsmParserContext::getInstructionAtLocation(const FileLoc &Query) const {
+ return getInstructionAtLocation(FileLocRange(Query, Query));
+}
+
+bool AsmParserContext::addFunctionLocation(Function *F,
+ const FileLocRange &Loc) {
+ return Functions.insert({F, Loc}).second;
+}
+
+bool AsmParserContext::addBlockLocation(BasicBlock *BB,
+ const FileLocRange &Loc) {
+ return Blocks.insert({BB, Loc}).second;
+}
+
+bool AsmParserContext::addInstructionLocation(Instruction *I,
+ const FileLocRange &Loc) {
+ return Instructions.insert({I, Loc}).second;
+}
+
+} // namespace llvm
diff --git a/llvm/lib/AsmParser/CMakeLists.txt b/llvm/lib/AsmParser/CMakeLists.txt
index 20d0c50a029ca..dcfcc06f093a7 100644
--- a/llvm/lib/AsmParser/CMakeLists.txt
+++ b/llvm/lib/AsmParser/CMakeLists.txt
@@ -1,5 +1,6 @@
# AsmParser
add_llvm_component_library(LLVMAsmParser
+ AsmParserContext.cpp
LLLexer.cpp
LLParser.cpp
Parser.cpp
diff --git a/llvm/lib/AsmParser/LLLexer.cpp b/llvm/lib/AsmParser/LLLexer.cpp
index 3d5bd6155536e..be5b2b9bce0ca 100644
--- a/llvm/lib/AsmParser/LLLexer.cpp
+++ b/llvm/lib/AsmParser/LLLexer.cpp
@@ -155,15 +155,6 @@ static bool isLabelChar(char C) {
C == '.' || C == '_';
}
-/// isLabelTail - Return true if this pointer points to a valid end of a label.
-static const char *isLabelTail(const char *CurPtr) {
- while (true) {
- if (CurPtr[0] == ':') return CurPtr+1;
- if (!isLabelChar(CurPtr[0])) return nullptr;
- ++CurPtr;
- }
-}
-
//===----------------------------------------------------------------------===//
// Lexer definition.
//===----------------------------------------------------------------------===//
@@ -174,27 +165,75 @@ LLLexer::LLLexer(StringRef StartBuf, SourceMgr &SM, SMDiagnostic &Err,
CurPtr = CurBuf.begin();
}
+const char *LLLexer::getLabelTail(const char *Ptr) {
+ while (Ptr != CurBuf.end()) {
+ if (Ptr[0] == ':')
+ return Ptr + 1;
+ if (!isLabelChar(Ptr[0]))
+ return nullptr;
+ ++Ptr;
+ }
+ return nullptr;
+}
+
int LLLexer::getNextChar() {
- char CurChar = *CurPtr++;
- switch (CurChar) {
- default: return (unsigned char)CurChar;
- case 0:
- // A nul character in the stream is either the end of the current buffer or
- // a random nul in the file. Disambiguate that here.
- if (CurPtr-1 != CurBuf.end())
- return 0; // Just whitespace.
-
- // Otherwise, return end of file.
- --CurPtr; // Another call to lex will return EOF again.
+ if (CurPtr == CurBuf.end())
return EOF;
+ // Increment line number if this is the first character after a newline
+ if (CurPtr > CurBuf.begin() && *(CurPtr - 1) == '\n') {
+ CurLineNum++;
+ CurColNum = 0;
+ } else
+ CurColNum++;
+ return *CurPtr++;
+}
+
+const char *LLLexer::skipNChars(unsigned N) {
+ while (N--)
+ getNextChar();
+ return CurPtr;
+}
+
+void LLLexer::advancePositionTo(const char *Ptr) {
+ bool RecalculateColumn = false;
+ while (CurPtr != Ptr) {
+ if (CurPtr > Ptr) {
+ --CurPtr;
+ --CurColNum;
+ // Since CurPtr is one char ahead of the stored position, check if the
+ // previous char is not a newline
+ if (CurPtr != CurBuf.begin() && *(CurPtr - 1) == '\n') {
+ --CurLineNum;
+ RecalculateColumn = true;
+ }
+ } else
+ getNextChar();
+ }
+ if (RecalculateColumn) {
+ CurColNum = 0;
+ // Count the number of chars to the previous newline or start of buffer
+ for (const char *Ptr = CurPtr; Ptr != CurBuf.begin() && *(Ptr - 1) != '\n';
+ --Ptr, ++CurColNum)
+ ;
}
}
lltok::Kind LLLexer::LexToken() {
+ // Set token end to next location, since the end is
+ // exclusive
+ if (CurPtr != CurBuf.begin() && *(CurPtr - 1) == '\n') {
+ PrevTokEndLineNum = CurLineNum + 1;
+ PrevTokEndColNum = 0;
+ } else {
+ PrevTokEndLineNum = CurLineNum;
+ PrevTokEndColNum = CurColNum + 1;
+ }
while (true) {
TokStart = CurPtr;
-
int CurChar = getNextChar();
+ CurTokColNum = CurColNum;
+ CurTokLineNum = CurLineNum;
+
switch (CurChar) {
default:
// Handle letters: [a-zA-Z_]
@@ -215,13 +254,13 @@ lltok::Kind LLLexer::LexToken() {
case '%': return LexPercent();
case '"': return LexQuote();
case '.':
- if (const char *Ptr = isLabelTail(CurPtr)) {
- CurPtr = Ptr;
+ if (const char *Ptr = getLabelTail(CurPtr)) {
+ advancePositionTo(Ptr);
StrVal.assign(TokStart, CurPtr-1);
return lltok::LabelStr;
}
if (CurPtr[0] == '.' && CurPtr[1] == '.') {
- CurPtr += 2;
+ skipNChars(2);
return lltok::dotdotdot;
}
return lltok::Error;
@@ -298,15 +337,15 @@ lltok::Kind LLLexer::LexAt() {
}
lltok::Kind LLLexer::LexDollar() {
- if (const char *Ptr = isLabelTail(TokStart)) {
- CurPtr = Ptr;
+ if (const char *Ptr = getLabelTail(TokStart)) {
+ advancePositionTo(Ptr);
StrVal.assign(TokStart, CurPtr - 1);
return lltok::LabelStr;
}
// Handle DollarStringConstant: $\"[^\"]*\"
if (CurPtr[0] == '"') {
- ++CurPtr;
+ getNextChar();
while (true) {
int CurChar = getNextChar();
@@ -358,11 +397,11 @@ bool LLLexer::ReadVarName() {
if (isalpha(static_cast<unsigned char>(CurPtr[0])) ||
CurPtr[0] == '-' || CurPtr[0] == '$' ||
CurPtr[0] == '.' || CurPtr[0] == '_') {
- ++CurPtr;
+ getNextChar();
while (isalnum(static_cast<unsigned char>(CurPtr[0])) ||
CurPtr[0] == '-' || CurPtr[0] == '$' ||
CurPtr[0] == '.' || CurPtr[0] == '_')
- ++CurPtr;
+ getNextChar();
StrVal.assign(NameStart, CurPtr);
return true;
@@ -376,7 +415,8 @@ lltok::Kind LLLexer::LexUIntID(lltok::Kind Token) {
if (!isdigit(static_cast<unsigned char>(CurPtr[0])))
return lltok::Error;
- for (++CurPtr; isdigit(static_cast<unsigned char>(CurPtr[0])); ++CurPtr)
+ for (getNextChar(); isdigit(static_cast<u...
[truncated]
|
std::optional<Function *> getFunctionAtLocation(const FileLocRange &) const; | ||
std::optional<Function *> getFunctionAtLocation(const FileLoc &) const; | ||
std::optional<BasicBlock *> getBlockAtLocation(const FileLocRange &) const; | ||
std::optional<BasicBlock *> getBlockAtLocation(const FileLoc &) const; | ||
std::optional<Instruction *> | ||
getInstructionAtLocation(const FileLocRange &) const; | ||
std::optional<Instruction *> getInstructionAtLocation(const FileLoc &) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we use nullptr here to indicate absence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
if (!Blocks.contains(BB)) | ||
return std::nullopt; | ||
return Blocks.at(BB); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: perhaps we can use DenseMap::lookup_or
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for that suggestion, makes the code a bit cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it sadly is not possible. Because the return type of lookup_or is the value type of the map, so it can't return a nullopt
dbgs() << #Loc1 " location: " << Loc1.Start.Line << ":" \ | ||
<< Loc1.Start.Col << " - " << Loc1.End.Line << ":" \ | ||
<< Loc1.End.Col << "\n"; \ | ||
dbgs() << #Loc2 " location: " << Loc2.Start.Line << ":" \ | ||
<< Loc2.Start.Col << " - " << Loc2.End.Line << ":" \ | ||
<< Loc2.End.Col << "\n"; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please wrap these line with LLVM_DEBUG. Though it's a bit unusual to have debug prints in a unit test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrapped it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gtest has it's own support for printing additional debugging information if an expectation fails. In the simplest case you can write something like ASSERT_TRUE(AreLocsEqual) << ...
. As this test is not hot, you shouldn't need the more sophisticated methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also EXPECT seems more appropriate than ASSERT here.
Gentle ping @mshockwave could you please have a look at the changes I did |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good to me. Could you tag other potential reviewers (and I'll help you to add them as I figured you probably don't have the permissions to do that yet) to give another pass?
I have not really an idea, most code this PR changes is from the initial lexer from Chris |
#include "llvm/Support/SourceMgr.h" | ||
#include "gtest/gtest.h" | ||
|
||
#define DEBUG_TYPE "Unittest-asm-parser-tests" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it's relatively rare to capitalize DEBUG_TYPE nowadays
#define DEBUG_TYPE "Unittest-asm-parser-tests" | |
#define DEBUG_TYPE "unittest-asm-parser-tests" |
/// | ||
/// This information is optionally emitted by the LLParser while | ||
/// it reads LLVM textual IR. | ||
class AsmParserContext { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name of this class makes it sound like something that should be owned by the LLParser. Maybe something like LLParserLocationInfo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think AsmParserContext is better, as it's more future proof. In the future we might want to add some more info, other than location. As for the LLParser vs AsmParser, I think AsmParser is better since it uses what is parsed other then how it's parserd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to agree with @arichardson -- I'd expect AsmParserContext to be something internally used by AsmParser. What kind of future extensions do you have in mind here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is supposed to be used similarly to AsmParserState in MLIR, In tools, that need to know some things from the parser, in this case we want to use it to be able to access locations or instructions, functions, ... in IR files for the purposes of navigating in the file via a LSP server (such as goto definition/references)
@@ -0,0 +1,70 @@ | |||
//===-- AsmParserContext.h --------------------------------------*- C++ -*-===// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Omit file name and emacs marker in new files.
/// | ||
/// This information is optionally emitted by the LLParser while | ||
/// it reads LLVM textual IR. | ||
class AsmParserContext { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to agree with @arichardson -- I'd expect AsmParserContext to be something internally used by AsmParser. What kind of future extensions do you have in mind here?
Function * | ||
AsmParserContext::getFunctionAtLocation(const FileLocRange &Query) const { | ||
for (auto &[F, Loc] : Functions) { | ||
if (Loc.contains(Query)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very inefficient...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's actually not that bad. When testing this even on large files the LSP was pretty responsive. So I wouldn't consider this an issue for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the changes in this file still necessary, given that you how determine the line/column based on the pointer, rather than updating it on the fly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some changes are needed, to track the locations of previous tokens.
Changing the manual pointer increments to getNextChar() is not crutial to this PR now, altho it would bring the benefit of eliminating some OOB errors. They were actually filed first in a separate PR #152103
I'm willing to remove them from this PR if requested
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible that some of these changes still make sense, but with the "everything must go through getNextChar" constraint removed, some things here are pretty weird now, like how skipNChars() works by doing N individual increments. I'd prefer dropping these changes. If there are OOB issues, we should address them separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dropped the changes from this PR
dbgs() << #Loc1 " location: " << Loc1.Start.Line << ":" \ | ||
<< Loc1.Start.Col << " - " << Loc1.End.Line << ":" \ | ||
<< Loc1.End.Col << "\n"; \ | ||
dbgs() << #Loc2 " location: " << Loc2.Start.Line << ":" \ | ||
<< Loc2.Start.Col << " - " << Loc2.End.Line << ":" \ | ||
<< Loc2.End.Col << "\n"; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gtest has it's own support for printing additional debugging information if an expectation fails. In the simplest case you can write something like ASSERT_TRUE(AreLocsEqual) << ...
. As this test is not hot, you shouldn't need the more sophisticated methods.
dbgs() << #Loc1 " location: " << Loc1.Start.Line << ":" \ | ||
<< Loc1.Start.Col << " - " << Loc1.End.Line << ":" \ | ||
<< Loc1.End.Col << "\n"; \ | ||
dbgs() << #Loc2 " location: " << Loc2.Start.Line << ":" \ | ||
<< Loc2.Start.Col << " - " << Loc2.End.Line << ":" \ | ||
<< Loc2.End.Col << "\n"; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also EXPECT seems more appropriate than ASSERT here.
|
||
TEST(AsmParserTest, ParserObjectLocations) { | ||
// Expected to fail with function location starting one character later, needs | ||
// a fix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit unclear on what this comment refers to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that was left there from a development iteration where the test was failing, removed the comment
Co-authored-by: Nikita Popov <[email protected]>
Co-authored-by: Nikita Popov <[email protected]>
Co-authored-by: Nikita Popov <[email protected]>
Co-authored-by: Nikita Popov <[email protected]>
Co-authored-by: Nikita Popov <[email protected]>
I hope I have addressed / responded all of the issues that nikic addressed |
|
||
Instruction * | ||
AsmParserContext::getInstructionAtLocation(const FileLocRange &Query) const { | ||
for (auto &[I, Loc] : Instructions) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iterating over all instructions seems rather inefficient e.g. if you're trying to get info towards the end of large file. Can't we have a list sorted by file location and do a binary search?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it doesn't matter if the lookup by function is more common. Do you have a link to code using this new class?
I'm not worried about performance for the LSP, optimizations can always be done later. Please don't interpret my review comments as negative towards this change, I am very excited to see this support land eventually.
This PR is part of the LLVM IR LSP server project (RFC)
To be able to make a LSP server, it's crutial to have location information about the LLVM objects (Functions, BasicBlocks and Instructions).
This PR adds:
The AsmParserContext can be passed as an optional parameter into the parser. Which populates it and it can be then used by other tools, such as the LSP server.
The AsmParserContext idea was borrowed from MLIR. As we didn't want to store data no one else uses inside the objects themselves. But the implementation is different, this class holds several maps of Functions, BasicBlocks and Instructions, to map them to their location.
The Lexer had all of it's manual pointer increments changed to getNextChar() which checks for linebreaks and updates the line and column counters accordingly.
And some utility methods were added to get the positions of the processed tokens.